A corpus-driven approach to discourse organisation: from cues to complex markers

نویسندگان

  • Marie-Paule Péry-Woodley
  • Lydia-Mai Ho-Dac
  • Josette Rebeyrolle
  • Ludovic Tanguy
  • Cécile Fabre
چکیده

This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature mark-up alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed “multi-level annotation” is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our “complex markers”—-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus Annotation of Macro Discourse Structures

We present our discourse annotation project, ANNODIS, which aims to make available a diversified French corpus annotated with discourse information, along with a set of tools for annotation and corpus exploitation. An original aspect of the project is that it combines two theoretically and methodologically different points of view on discourse: bottom-up and top-down. In the bottom-up perspecti...

متن کامل

STANCE AND ENGAGEMENT DISCOURSE MARKERS IN JOURNAL’S “AUTHOR GUIDELINES”

Over the past decade, there has been an increasing interest in the study of interactional metadiscourse markers in different contexts. However, not much research has been conducted about the discourse of journal author guidelines, especially the use of meta-discourse markers in this genre. Therefore, this corpus-based study had three main aims: 1) to delve deep into the types, frequencies and f...

متن کامل

An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus

We describe the Annodis corpus of discourse structures for French. The corpus joins two perspectives on discourse on a variety of textual genres: a bottom-up approach and a top-down approach. The bottom-up view builds incrementally a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed ...

متن کامل

Metadiscourse Markers in a Corpus of Learner Language: The Case of Iranian EFL Learners

Different issues have been probed in learner corpus research since the late 1980s.However, taking the im- portance of meta discourse markers (MDMs) in signposting academic discourse, their use in Iranian EFL learners‟ academic essays is an area of research in need of a more serious analysis. Contributing to this line of investigation, this paper reports a corpus-based study of the use of MDMs i...

متن کامل

A Corpus-based Study of Lexical Bundles in Discussion Section of Medical Research Articles

There has been increasing interest in utilizing corpora in linguistic research and pedagogy in recent years. Rhetorical organization of different sections of research articles may appear similar in various disciplines, but close examination may show subtle differences nonetheless. One of the features that has been at the center of attention especially in recent years is the idiomaticity of a di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • D&D

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2017